Information Filtering via Self-Consistent Refinement 



Jie Ren 1 ' 2 , Tao Zhou 1 ' 3 ' 4 Q and Yi-Cheng Zhang 1 ' 4 
1 Department of Physics, University of Fribourg, Chemin du Muse 3, 1700 Fribourg Switzerland 
2 Department of Physics and Centre for Computational Science and Engineering, 
National University of Singapore, Singapore 117542, Republic of Singapore 
3 Department of Modern Physics, University of Science and Technology of China, Hefei 230026, PR China 
4 Information Economy and Internet Research Laboratory, 
University of Electronic Science and Technology of China, Chengdu 610054, PR China 

Recommender systems are significant to help people deal with the world of information explosion 
and overload. In this Letter, we develop a general framework named self-consistent refinement 
and implement it be embedding two representative recommendation algorithms: similarity-based 
and spectrum-based methods. Numerical simulations on a benchmark data set demonstrate that 
the present method converges fast and can provide quite better performance than the standard 
methods. 
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Introduction. — The last few years have witnessed an 
explosion of information that the Internet and World 
Wide Web have brought us into a world of endless pos- 
sibilities: people may choose from thousands of movies, 
millions of books, and billions of web pages. The amount 
of information is increasing more quickly than our pro- 
cessing ability, thus evaluating all these alternatives and 
then making choice becomes infeasible. As a conse- 
quence, an urgent problem is how to automatically ex- 
tract the hidden information and do a personal recom- 
mendation. For example, Amazon.com uses one's pur- 
chase record to recommend books and Adaptive- 
Info. com uses one's reading history to recommend news 
Q ■ Motivated by the significance in economy and society, 
the design of an efficient recommendation algorithm be- 
comes a joint focus from engineering science [H, H| to mar- 
keting practice 0, @ , from mathematical analysis @, H| 
to physics community [1 E3, El El El EI El EH • 

A recommender system, consisted of N users and M 
items, can be fully described by an N x M rating matrix 
R, with Ri a j^z the rating user i gives to item a. If i 
has not yet evaluated a, Ri a is set as zero. The aim of a 
recommender system, or of a recommendation algorithm, 
is to predict ratings for the items have not been voted. 
To evaluate the algorithmic accuracy, the given data set 
is usually divided into two parts: one is the training set, 
and the other one is the testing set. Only the infor- 
mation contained in the training set can be used in the 
prediction. Denoting the predicted rating matrix as R, 
the most commonly used measurement for the algorith- 
mic accuracy, namely the mean average error (MAE), is 
defined as: 



MAE=±Y1 
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where the subscript (i, a) runs over all the elements cor- 



responding to the non-zero ratings in testing set, R* de- 
notes the rating matrix for testing set, and S is the num- 
ber of non-zero ratings in R*. 

Thus far, the most accurate algorithms are content- 
based [17] . However, those methods are practical only 
if the items have well-defined attributes, and those at- 
tributes can be extracted automatically. Besides the 
content-based algorithms, the recommendation methods 
can be classified into two main c ateg ories: similarity- 
based [H, EH and spectrum-based [H Hl[ . In this Let- 
ter, we propose a generic framework of self- consistent re- 
finement (SCR) for the personal recommendation, which 
is implemented by embedding the similarity-based and 
spectrum-based methods, respectively. Numerical simu- 
lations on a benchmark data set demonstrate the signif- 
icant improvement of algorithmic performance via SCR 
compared with the standard methods. 

Generic framework of SCR. — The similarity-based and 
spectrum-based algorithms, including their extensions, 
can be expressed in a generic matrix formula 



R = D(R), 



(2) 



where R is the rating matrix obtained from the train- 
ing set, R the predicted rating matrix, and D a matrix 
operator. This operator, 2), may be extremely simple 
as a left-multiplying matrix used in the basic similarity- 
based method, or very complicated, usually involving a 
latent optimization process, like the case of rank-fc sin- 
gular value decomposition (see below for details). Most 
previous works concentrated on the design of the opera- 
tor 2). In contrast, we propose a completely new scenario 
where Eq. (2) is replaced by a SCR via iterations. De- 
noting the initial configuration i?' ' = R, and the initial 
time step k = 0, a generic framework of SCR reads: 

(i) Implement the operation S)(i?' fc )); 

(ii) Set the elements of i?( fc+1 ) as 



R 



(fe+i) _ r s(i?w) M , 



Ri, 



Ri, 
Ri, 



-0, 



(3) 
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Then, set k = k + 1. 
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(iii) Repeat (i)(ii) until the difference between and 
R( k ~ 1 *> (or, more practical, the difference \MAE(k) — 
MAE{k— 1)|) is smaller than a given terminate threshold. 

Consider the matrix series R^ \R (1 \ • • • , R<- T > (T de- 
notes the last time step) as a certain dynamics driven 
by the operator D, all the elements corresponding to the 
voted items (i.e. Ri a 7^ 0) can be treated as the boundary 
conditions giving expression to the known information. 
If R is an ideal prediction, consider itself as the known 
rating matrix, is should satisfy the self-consistent condi- 
tion R — D(R). However, this equation is not hold for 
the standard methods. Correspondingly, the convergent 
matrix R^ is self-consistent. Though the simplicity of 
SCR, it leads to a great improvement compared with the 
traditional case shown in Eq. (2). 

Similarity-based SCR. — The basic idea behind the 
similarity-based method is that: a user who likes a item 
will also like other similar items [191 ] . Taking into account 
the different evaluation scales of different users [12|, ll6| , 
we subtract the corresponding user average from each 
evaluated entry in the matrix R and get a new matrix 
R' . The similarity between items a and (3 is given by: 

n a „= S^SpJg <2 €[-1,1], (4) 

y22ieU Ria Rip 

where (R)i is the average evaluation of user i and R' ia = 
Ria — (R)i- U denotes the set of users who evaluated 
both items a and /?. Cl a p — > 1 means the items a and 
f3 are very similar, while Q a p — ► ~ 1 means the opposite 
case. 

In the most widely applied similarity-based algorithm, 
namely collaborative filtering (22 . 23], the predicted rat- 
ing is calculated by using a weighted average, as: 

5 _ E/3 gag ■ Rjp . , 

ia ~ |o i ' ^ ' 

2^/3 \ il ap\ 

The contribution of fl a p • R'^ is positive if the signs of 
£l a p and R'^ are the same. That is to say, a person i like 
item a may result from the situations (i) the person i likes 
the item j3 which is similar to item a, or (ii) the person 
i dislikes the item (3 which is opposite to item a (i.e. 

< 0). Note that, when computing the predictions 
to a specific user i, we have to add the average rating of 
this user, (R)i, back to Ri a . 

Obviously, Eq. (5) can be rewritten in a matrix form 
for any given user i, as 

Ri = P-R'i, (6) 

where Ri and R[ are M-dimensional column vectors de- 
noting the predicted and known ratings for user i, and 
P = J2(3 ^ap/ ^2fj \^lap\, acting as the transfer matrix. 
For simplicity, hereinafter, without confusion, we cancel 
the subscript i and superscript - a comma. Since for each 
user, the predicting operation can be expressed in a ma- 
trix form, we can get the numerical results by directly 



using the general framework of SCR, as shown in Eq. 
(3). However, we have to perform the matrix multiply- 
ing for every user, which takes long time in computation 
especially for huge-size recommendcr systems. 

To get the analytical expression and reduce the compu- 
tational complexity, for a given user, we group its known 
ratings (as boundary conditions) and unknown ratings 
into Rb and Ru, respectively. Correspondingly, matrix 
P is re-arranged by the same order as R. For this user, 
we can rewrite Eq. (6) in a sub-matrix multiplying form: 

(Rb\ _ (Pbb Pbu\ (Rb\ m 
\RuJ \Pub Puu)\Ru)- 1 ' 

In the standard collaborative filtering [22|, UK , as shown 
in Eq. (5), the unknown vector, Rjj, is set as a zero 
vector. Therefore, the predicted vector, Rjj, can be ex- 
pressed by a compact form: 

Ru = Pub ■ Rb- (8) 

Clearly, it only takes into account the direct correlations 
between the unknown and known sets. 

The solution Eq. (8) does not obey the self-consistent 
condition, for the free sub- vector Ru is not equal to Rjj. 
Considering the self-consistent condition (i.e Rjj = Ru), 
the predicted vector should obey the following equation: 

Ru = PubRb + PuuRui (9) 

whose solution reads: 

Ru = (I-Puu^PubRb- (10) 

This solution differs from the standard collaborative fil- 
tering by an additional item (/ — Puu)^ 1 ■ 

Since it may not be practical to directly inverse (/ — 
Puu) especially for huge-size Puu, we come up with a 
simple and efficient iterative method: Substitute the first 
results Ru for Ru, on the right term of Eq. (6), and 
take Rb as the fixed boundary conditions. Then, get the 
second step results about Ru, and substitute it for Ru 
again. Do it repeatedly, at the nth step, we get: 

Ru = (I + Puu +P V u + --- + Puu^PubRb- (11) 

Since the dominant eigenvalue of Puu is smaller than 1, 
P{} u converges exponentially fast [24[ , and we can get the 
stable solution quickly within several steps. 

In addition, besides the item-item similarity used in- 
troduced here, the similarity-based method can also be 
implemented analogously via using the user-user similar- 
ity [4j. The SCR can also be embedded in that case, and 
gain much better algorithmic accuracy. 

Spectrum-based SCR. — We here present a spectrum- 
based algorithm, which relies on the Singular Value De- 
composition (SVD) of the rating matrix. Analogously, 
we use the matrix with subtraction of average ratings, 
R', instead of R. The SVD of R' is defined as [1|: 

R' = U ■ S ■ V T , (12) 
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FIG. 1: (a) Prediction error vs. iteration step, with p — 0.9 
fixed, (b) The comparison of algorithmic accuracy between 
the standard similarity-based method and the similarity- 
based SCR for different p. 



FIG. 2: (a) Prediction error vs. iteration step, with p = 0.9 
fixed, (b) The comparison of algorithmic accuracy between 
the standard spectrum-based method and the spectrum-based 
SCR for different p. 



where U is an N x N unitary matrix formed by the eigen- 
vectors of R'R' T , S is an N x M singular value ma- 
trix with nonnegative numbers in decreasing order on 
the diagonal and zeros off the diagonal, and V T is an 
M x M unitary matrix formed by the eigenvectors of 
R' T R'. The number of positive diagonal elements in S 
equals rank(i?'). 

We keep only the fc largest diagonal elements (also the 
fc largest singular values) to obtain a reduced k x k matrix 
Sk, and then, reduce the matrices U and V accordingly. 
That is to say, only the k column vectors of U and k 
row vectors of V T corresponding to the k largest singular 
values are kept. The reconstructed matrix reads: 



R' k = U k ■ S k ■ V k , 



(13) 



R' k ^ R'), but the closest rank-fc matrix to R [26 
other words, R' k minimizes the Frobenius norm \\R'- 



where U k , S k and V k have dimensions N x k, k x k and 
k x M, respectively. Note that, Eq. (13) is no longer 
the exact decomposition of the original matrix i?^__(i.e., 

. In 

■R'kW 

27 1 over all rank-fc matrices. Previous studies found that 
28j | the reduced dimensional approximation sometimes 
performs better than the original matrix in information 
retrieval since it filters out the small singular values that 
may be highly distorted by the noise. 

Actually, each row of the N x k matrix U k \/Sk repre- 
sents the vector of the corresponding agent's tastes, and 
each row of the M x k matrix Vk\fS~k~ characterizes the 
features of the corresponding item. Therefore, the pre- 
diction of the evaluation a user i gives to an item a can 
be obtained by computing the inner product of the i-th 
row of UkVSk and the a-th row of V k \fS k : 



R — Uk\/ Sk 

= u k -s k 



= u k . 



Vf = Ru- 



(14) 



This derivation reproduces the Eq. (13), and illuminates 
the reason why using SVD to extract hidden informa- 



tion in user-item rating matrix. The entry Ri a is the 
predicted rating of user i on item a. 

An underlying assumption in the fc-truncated SVD 
method is the existence of k principle attributes in both 
the user's tastes and the item's features. For example, a 
movie's attributes may include the director, hero, hero- 
ine, gut, music, etc., and a user has his personal tastes 
on each attribute. If a movie is well fit his tastes, he will 
give a high rating, otherwise a low rating. Denote the 
states of a user i and an item a as: 

= (uj, ,tt?);(« a | = K,vl,--- ,t£), (15) 

then we can estimate the evaluation of i on a as the 
matching extent between their tastes and features: 



Ria = (ui\v a ) 



(16) 



Therefore, we want to find a matrix R that can be de- 
composed to N fc-dimensional taste vectors and M fc- 
dimensional feature vectors so that the corresponding 
entries are exactly the same as the known ratings and 
consequently, the other entries are the predicted ratings. 

However, the fc-truncated SVD matrix is not self- 
consistent for the elements corresponding to the known 
ratings in R' k are not exactly the same as those in R' . A 
self-consistent prediction matrix can be obtained via an 
iterative fc-truncated SVD process by resetting those el- 
ements back to the known values at each step. Referring 
to Eq. (3), the Spectrum-based SCR treats the known 
ratings as the boundary conditions, and use fc-truncated 
SVD as the matrix operator D. The iteration will con- 
verge to a stable matrix i?, namely the predicted matrix. 

Numerical results. — To test the algorithmic accuracy, 
we use a benchmark data set, namely MovieLens [291] . 
The data consists of N = 3020 users, M — 1809 movies, 
and 2.24 x 10 5 discrete ratings 1-5. All the ratings are 
sorted according to their time stamps. We set a fraction 
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p of earlier ratings as the training set, and the remain 
ratings (with later time stamps) as the testing set. 

As shown in Figs. 1 and 2, both the similarity- 
based and spectrum-based SCRs converge very fast, and 
sharply improve the algorithmic accuracy of the standard 
methods. In spectrum-based methods, the parameter k 
is not observable in the real system, thus we treat it as a 
tunable parameter. The results displayed in Fig. 2 cor- 
respond to the optimal k that minimizes the prediction 
error. For different p, the optimal k is different. Denoting 
the data density as p = E/NM, where E is the number 
of ratings in the training set. The spectrum-based SCR 
will converge only if k is smaller than a threshold 

k c = N + M ~ 2 -,[(^±^) 2 -NM P ^ ™E . 

2 VV 2 / P iV + M - 2 

So that the searching horizon of optimal k can be re- 
duced to the natural numbers not larger than k c . The 
mathematical derivation and numerical results about this 
threshold behavior, as well as the sensitivity of algorith- 
mic performance to k will be discussed elsewhere. 

Conclusions. — In this Letter, we proposed a algorith- 
mic framework for recommender systems, namely self- 
consistent refinement. This general framework is im- 
plemented by embedding two representative recommen- 
dation algorithms: similarity-based and spectrum-based 
methods. Numerical simulations on a benchmark data 
set demonstrate the significant improvement of algorith- 
mic accuracy compared with the standard algorithms. 
Actually, the spectrum-based SCR has higher accuracy 



than the similarity-based one, but it requires an opti- 
mizing process on the selection of the parameter k, thus 
takes longer computational time. 

Besides the similarity-based and spectrum-based 
methods, very recently, some new kinds of recommen- 
dation algorithms that mimic certain physics dynamics, 
such as heat conduction [ll[ and mass diffusion [13], are 
suggested to be the promising candidates in the next gen- 
eration of recommender systems for they provide bet- 
ter algorithmic accuracy while have lower computational 
complexity. It is worthwhile to emphasize that those two 
algorithms [111 03 a l so belong to the framework of SCR 
- they are just two specific realizations of SCR if consid- 
ering the matrix operator £> as the conduction of heat or 
the exchange of mass during one step. In fact, the SCR 
framework is of great generality, and any algorithm that 
can be expressed in the form of Eq. (2) has the opportu- 
nity being improved via iterative SCR. Furthermore, the 
present method can be applied in not only the recom- 
mender systems, but also many other subjects, such as 
data clustering, miss data mining, detection of commu- 
nity structure, pattern recognition, predicting of protein 
structure, and so on. 

This work is partially supported by SBF (Switzer- 
land) for financial support through project C05.0148 
(Physics of Risk), and the Swiss National Science Foun- 
dation (205120-113842). T.Z. acknowledges NNSFC un- 
der Grant No. 10635040 and 60744003, as well as the 973 
Project 2006CB705500. 



[1] G. Linden et ai, IEEE Internet Computing 7, 76 (2003). 
[2] D. Billsus et al, Commun. ACM 45, 34 (2002). 
[3] J. L. Herlocker et ai, ACM Trans. Inform. Syst. 22, 5 
(2004). 

[4] G. Adomavicius et ai, IEEE Trans. Knowl. Data Eng. 
17, 734 (2005). 

[5] A. Ansari et al, J. Mark. Res. 37, 363 (2000). 

[6] Y. P. Ying et ai, J. Mark. Res. 43, 355 (2006). 

[7] R. Kumar et al, J. Comput. Syst. Sci. 63, 42 (2001). 

[8] J. ODonovan et ai, Proc. 10th Int'l Con}. Intell. User 
Interfaces (2005). 

[9] S. Maslov et ai, Phys. Rev. Lett. 87, 248701 (2001). 
[10] P. Laureti et ai, EPL 75, 1006 (2006). 
[11] Y.-C. Zhang et al, Phys. Rev. Lett. 99, 154301 (2007). 
[12] Y.-C. Zhang et ai, EPL 80, 68003 (2007). 
[13] T. Zhou et al, Phys. Rev. E 76, 046115 (2007). 
[14] T. Zhou et ai, EPL 81, 58004 (2008). 
[15] C.-K. Yu et al, Physica A 371, 732 (2006). 
[16] M. Blattner et al, Physica A 373, 753 (2007). 
[17] M. J. Pazzani et al, Lect. Notes Comput. Sci. 4321, 325 
(2007). 

[18] J. A. Konstan et al, Commun. ACM 40, 77 (1997). 



[19] B. Sarwar et ai, Proc. 10th Int'l WWW Con}. (2001). 
[20] D. Billsus et al., Proc. Int'l Con}. Machine Learning 
(1998). 

[21] B. Sarwar et al, Proc. ACM WebKDD Workshop (2000). 

[22] P. Resnick et al., Proc. Comput. Supported Cooperative 
Work Con}. (1994). 

[23] J. S. Breese et al., Proc. 14th Con}. Uncertainty in Arti- 
ficial Intelligence (1998). 

[24] G. H. Golub et ai, Matrix Computation (Baltimore, 
Johns Hopkins University Press, 1996). 

[25] X. Zhang, Matrix Analysis and Applications (Beijing, Ts- 
inghua University Press & Springer, 2004). 

[26] R. A. Horn et al, Matrix analysis (Cambridge University 
Press, 1985). 

[27] The Fribenius norm (also called Euclidean norm, Schui 
norm or Hilbert-Schmidt norm) of a matrix {ay}, is de- 

/ \ 1/2 

fined as |L4|| = (£\ £\ OyJ . 

[28] M. W. Berry et al, SIAM Rev. 37, 573 (1995). 
[29] The MovieLens data can be download from the website 
of GroupLens Research (http://www.grouplens.org). 



